Talk:Vector database
![]() | This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||
|
Why was the original content removed from the article?
It seemed that the history at:
- https://en.wikipedia.org/w/index.php?title=Vector_database&oldid=1159882107
- https://en.wikipedia.org/w/index.php?title=Vector_database&oldid=1176100870
was a good start for the article, but it was stripped down to a one sentence introduction? Oneequalsequalsone (talk) 23:02, 4 October 2023 (UTC)
- The first one uses bad styling for what is needed by Wikipedia, the second one does not use WP:RS. I propose that we start rewriting it based on good sources. PhotographyEdits (talk) 13:05, 5 October 2023 (UTC)
- The first one was adequate for a stub article. Proposals of rewriting in the future are no substitute for even mediocre content; the article as it stands is worse than useless. 2001:5A8:460E:5E00:79EF:AD12:BBB8:D535 (talk) 22:29, 26 October 2023 (UTC)
List notability
I just stumbled into this article and noticed the "list of vector databases" mostly consists of entries with only primary sources. Entries here really should be established as notable for inclusion. A good rule of thumb on this is having at least one reliable secondary source discussing it (and not in passing). For more information, see NLIST StereoFolic (talk) 21:25, 29 October 2023 (UTC)
- Okay, I've finished going through them and verifying notability of existing entries. The list should be in better shape now, but further eyes are appreciated. The list is also definitely missing a bunch of entries. StereoFolic (talk) 21:53, 29 October 2023 (UTC)
- Thanks, appreciated! PhotographyEdits (talk) 00:20, 30 October 2023 (UTC)
Add OpenSearch
![]() | This edit request by an editor with a conflict of interest has now been answered. |
Here is a draft entry for OpenSearch including both primary and secondary sources. Since I work on OpenSearch at AWS, I am mindful of WP:COI, and would appreciate it if someone else would insert this. --Macrakis (talk) 21:07, 7 February 2024 (UTC)
Name | License |
---|---|
OpenSearch[1][2][3] | Apache License 2.0[4] |
- ^ "Using OpenSearch as a Vector Database". OpenSearch.org. 2023-08-02. Retrieved 2024-02-07.
- ^ Pan, James Jie; Wang, Jianguo; Li, Guoliang (2023-10-21), Survey of Vector Database Management Systems, doi:10.48550/arXiv.2310.14021, retrieved 2024-02-07
- ^ "AWS debuts new AI-powered data management and analysis tools". SiliconANGLE. 2023-07-26. Retrieved 2024-02-07.
- ^ "OpenSearch license". github.
@StereoFolic and PhotographyEdits: Please take a look at the above proposed edit. Thanks, --Macrakis (talk) 21:20, 7 February 2024 (UTC)
By the way, Zilliz's managed service offering Cardinal supposedly now includes proprietary enhancements to Milvus, so it probably should have a separate line item again. And (COI alert!) perhaps the Amazon OpenSearch Service should also have a line item as the managed service version of OpenSearch. --Macrakis (talk) 23:08, 7 February 2024 (UTC)
- I've just added the edit, thanks for going through the proper disclosure and request process. I'm hesitant about adding entries for hosted options, since to me this list is more about database systems themselves, not vendors offering hosted versions of them (even with fairly minor changes). Anyone interested in finding hosted services for these databases can easily find that information with an online search. StereoFolic (talk) 15:42, 10 February 2024 (UTC)
- "Anyone interested in finding hosted services for these databases can easily find that information with an online search." Well, there is also such a list for VPN serivce. It just depends on the coverage in third party reliable sources. If there is extensive coverage, we should definitely cover it in Wikipedia too. But I don't think it is there yet. PhotographyEdits (talk) 22:09, 11 February 2024 (UTC)
Faiss
Dtunkelang, I just reviewed your source for Faiss being a database, not a library, but I'm still unsure about this one. The blog post lists it in its '5 best vector databases', but the description provided within simply describes it as a library, and in any case I'm unsure this blog would qualify as a reliable source. We could potentially consider the library an in-memory database, but it seems like a stretch and a description the developers themselves don't take on. StereoFolic (talk) 12:02, 14 February 2024 (UTC)
- I am unpersuaded, since I don't see any way in which Faiss fails to satisfy the definition of a vector database in the post. What criterion, in your view, does it fail to meet? Dtunkelang (talk) 18:53, 14 March 2024 (UTC)
- My understanding is that databases, pretty much by definition, are responsible for storing data. If I understand correctly, Faiss doesn't store data, it only queries data provided to it. If we consider any library that loads vectors and manipulates them to be a database, we get lots of obviously wrong labels like calling tensorflow and numpy vector databases. StereoFolic (talk) 22:31, 14 March 2024 (UTC)
- Faiss stores data -- you have to populate a Faiss index in order to query it. I've stored data in. Faiss index myself -- that's the only wast to use it. From https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/: "Faiss (both C++ and Python) provides instances of Index. Each Index subclass implements an indexing structure, to which vectors can be added and searched." Dtunkelang (talk) 18:32, 17 March 2024 (UTC)
- My understanding is that databases, pretty much by definition, are responsible for storing data. If I understand correctly, Faiss doesn't store data, it only queries data provided to it. If we consider any library that loads vectors and manipulates them to be a database, we get lots of obviously wrong labels like calling tensorflow and numpy vector databases. StereoFolic (talk) 22:31, 14 March 2024 (UTC)